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Abstract — Making  agriculture  sustainable  and  resilient  to 
the  ongoing  change  in  climate  and  social  structure  is  a 
major  challenge  for  the  scientists  and  researchers  across 
the  globe.  Agricultural  system  demands  transition  and  a 
multidisciplinary  approach.  Intelligent  and  precision 
agricultural  approaches  were  given  due  importance  for 
increasing  production  and  productivity  from  the  very  same 
limited  resources.  The  approach  needs  information  from 
various  sources  and  efficient  use  of  them  in  relevant  field. 
This  need  lead  to  growing  interest  in  knowledge  discovery 
from  vast  piles  of  data  generated  out  of  various  research 
and  survey  works.  The  emergence  of  Data  Mining 
techniques  revolutionized  the  field  of  information 
generation  and  pattern  recognition.  Though  Data  Mining  is 
an  emerging  science,  it  finds  a wide  application  in 
agriculture  and  allied  sectors,  and  has  a wide  future 
prospect. 
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I.  INTRODUCTION 

Agriculture  is  experiencing  a transition  stage  driven  by 
population  pressure  and  climate  change.  More  production 
and  productivity  are  being  expected  from  limited  resources. 
New  intensive  research  is  being  done  to  explore  ways  to 
increase  production  with  optimum  use  of  resources 
maintaining  sustainability.  This  lead  to  the  use  of  modem 
sophisticated  computer  assisted  technologies  in  agricultural 
research.  Due  to  widespread  use  of  computer  and  affordable 
storage  facilities,  there  is  an  enormous  wealth  of  data 
embedded  in  huge  databases  of  different  agri-allied 
enterprises.  The  process  leads  to  the  generation  of 
megabytes,  gigabytes,  and  terabytes  of  data  that  are  piling 
up  in  the  electronic  vaults  of  different  organizations, 
institutions  and  companies.  With  the  advent  of  internet  and 
World  Wide  Web,  the  accessibility  of  data  increase 
thousand  times.  These  massive  databases  coevolving  with 
new  research  methodologies  in  agriculture  with  widespread 
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application  of  information  technology  could  be  a precious 
repository  of  information  for  the  decision  makers,  right 
from  policy  makers  and  researchers  to  the  farmers.  Beside 
the  four  economic  factor  of  production  viz.  Land,  Labor, 
Capital  and  Entrepreneurial  ability,  ‘ information ’ emerges 
as  fifth  vital  factor  [1],  Particularly  in  agriculture  and  allied 
sectors,  the  role  of  information  is  ever  increasing. 
Information  regarding  weather,  soil,  disease,  insects,  seed, 
fertilizer,  market  etc.  becomes  important  input  for  economic 
and  sustainable  development  of  these  sectors. 

The  tremendous  amounts  of  data  generated  out  of  these 
processes  have  unexplored  potential  for  improving  the 
efficiency  of  the  related  sectors.  We  need  techniques  and 
technologies  to  derive  information  and  knowledge  from 
such  gigantic  data  set.  In  the  domain  of  scientific 
computing,  the  major  problem  is  to  infer  valuable 
information  from  observed  data,  especially  from  those  areas 
that  generate  enormous  amount  of  data  each  day,  like 
satellite  remote  sensing.  No  much  attention  was  given  on 
these  huge  repositories  of  information  until  1990s  due  to 
lack  of  efficient  methods  and  techniques  and  these  vast 
accumulated  data  during  processes  of  daily  activities  was 
dumped  in  archival  files. 

The  recent  advancement  in  analysis  techniques  and  advent 
of  faster  analysis  tools  that  can  help  filter  and  analyze  the 
stockpiles  of  data,  turning  up  valuable  and  often  surprising 
information  attracted  the  attention  of  researchers  across  the 
globe.  The  fact  leads  to  the  evolution  of  a new  discipline  in 
computer  science  - Data  Mining,  which  involves 
exploration  and  analysis  of  large  data  sets,  in  order  to 
discover  meaningful  patterns  and  rules  [2],  The  major  focus 
and  intension  are  to  use  existing  data  to  invent  new  facts 
and  to  uncover  new  relationships  previously  unknown  even 
to  the  experts.  The  basic  idea  is  -“when  same  data  analyzed 
in  different  context,  new  context  based  information  and 
knowledge  is  generated”.  Data  mining  takes  the 
evolutionary  process  beyond  retrospective  data  to  access 
and  navigation  to  prospective  and  proactive  information 
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delivery  and  is  supported  by  three  technologies:  massive 
data  collection,  high  performance  computing  and  data 
mining  algorithms.  Data  mining  involves  multi-disciplinary 
approaches  and  is  component  of  wider  process  called 
knowledge  discovery  from  databases.  It  draws  work  from 
areas  including  database  technology,  machine  learning, 
statistics,  pattern  recognition,  information  retrieval,  neural 
networks,  knowledge-based  systems,  artificial  intelligence, 
and  high  performance  computing  and  data  visualization.  It 
presents  techniques  for  discovery  of  patterns  hidden  in  large 
data  sets,  focusing  on  issues  relating  to  their  feasibility, 
usefulness,  effectiveness,  and  scalability. 

The  objective  of  this  review  is  to  introduce  briefly  the 
techniques  of  Data  Mining  and  to  outline  its  use  in 
agriculture  and  allied  sectors.  Being  much  more  superior  to 
the  conventional  data  analysis  techniques  used  in 
agricultural  research,  data  mining  can  open  a new  avenue 
for  research  and  development  in  agriculture  and  associated 
ventures. 

II.  TASK  OF  DATA  MINING 

The  main  tasks  of  data  mining  include  Classification, 
Estimation,  Prediction,  Association  rules.  Clustering,  and 
Description  & Visualization.  Among  these  tasks,  the  first 
three  - Classification,  Estimation  and  Prediction  are  all 
examples  of  directed  data  mining  or  supervised  learning  in 
which  the  target  is  known  in  advance  and  the  aim  is  to 
describe  the  target  attribute(s)  in  terms  of  rest  of  the 
attributes.  The  next  three  tasks  - Association  rules. 
Clustering  and  Description  are  examples  of  undirected  data 
mining  in  which  new  relationship  is  aimed  to  establish 
among  the  attributes. 

III.  DATA  MINING  TECHNIQUES 

Data  mining  is  an  emerging  science  and  new  techniques  are 
developing  specific  to  task.  The  techniques  of  data  mining 
are  broadly  classified  under  following  subjects  - Statistics, 
Machine  Learning,  Fuzzy  Logic  and  Rough  sets  techniques. 
The  elaboration  and  mathematics  of  techniques  are 
available  in  standard  Data  Mining  text  books. 

IV.  APPLICATION  OF  DATA  MINING 
TECHNIQUES  IN  AGRICULTURE 

Data  mining  techniques  find  wide  application  in  agriculture 
and  allied  sectors.  Lee  et  al. [3]  used  the  knowledge 
discovery  life  cycle  (KDLC)  model  for  study  dealing  with 
crop  yield  and  visualization  using  Geographic  Information 
System.  In  the  study,  the  significance  of  the  multi-strategy 
knowledge  discovery  and  visualization  process  in  analyzing 
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the  classifications  and  learned  rules  has  been  empirically 
verified  in  KDLC.  Bajwa  et  al  [4]  has  evaluated  five 
methods  of  band  selection  (three  unsupervised  and  two 
supervised)  were  compared  for  selecting  signature  bands 
from  hyperspectral  imagery  for  characterizing  soil  ECa  and 
canopy  density  in  agricultural  fields.  In  main  focus  of  the 
work  of  [5],  was  to  study  the  functioning  model  of  a detritic 
aquifer  undergoing  overexploitation  and  nitrate  excess  input 
coming  from  strawberry  and  citrus  intensive  crops  in  its 
recharge  zone  using  the  Data  Mining  technique  of  Fuzzy 
Logic.  By  studying  the  large  dataset,  several  simulation 
models  of  soil  dynamics  have  been  developed  viz.  DSSAT 
[6],  CROPSYST  [7]  and  GLEAMS  [8]  to  name  a few.  Jain 
et  al  [6]  presents  the  potential  of  three  machine-learning 
techniques  viz.  Decision  Tree  induction  using  C4.5,  Rough 
Sets  and  hybridized  rough  set  based  decision  tree  induction 
over  the  traditional  regression  analysis  for  disease 
forecasting.  The  forecasting  based  on  machine-learning  was 
found  to  be  accurate  compared  to  the  traditional  methods.  In 
their  studies  using  fuzzy  logic,  Meyer  et  al  [7]  found  that 
fuzzy  techniques  using  ZGK  algorithm  could  be  potentially 
useful  for  remote  sensing,  mapping,  crop  management, 
weed,  and  pest  control  for  precision  agriculture.  Data 
mining  played  vital  role  in  classification  of  soil.  Stockle  et 
al  [9]  used  technique  of  K-means  Approaches  for 
classifying  soils  in  combination  with  GPS-based 
technologies.  A technique  like  Support  Vector  Machine 
(SVM)  was  found  to  have  important  application  in 
classification  of  crops  [10]  and  yield  prediction  [11].  The 
yield  prediction  is  further  tuned  by  advanced  recent  concept 
of  spatial  autocorrelation  [12].  Artificial  Neural  Network 
(ANN)  plays  very  important  role  in  development  of  precise 
forecasting  and  forewarning  models  of  plant  diseases  [13]. 
To  identify  patterns  of  weather  data,  the  technique  like 
Independent  Component  Analysis  found  to  be  very  effective 
[14].  Integration  of  agricultural  data  that  includes  pest 
scouting,  pesticide  usage  and  meteorological  recording  is 
found  to  be  useful  for  optimization  of  pesticide  usages  [15]. 
Automatic  Data  Mining  techniques  have  been  recently  used 
for  recognizing  and  grading  fruits  [16],  In  China,  the 
relation  between  climate  change,  water  resources  and 
agriculture  was  undertaken  using  the  technique  of  Data 
Mining  [17].  Data  mining  is  recognized  as  the  most  advance 
concept  for  prediction  of  market  fluctuation  and  price 
variability.  Ding  et  al  [18]  used  the  technique  of  Decision 
Tree  for  prediction  of  market  price  of  pig  in  China.  It  also 
finds  application  in  prediction  of  food  borne  disease 
outbreaks  [19]  and  the  forecast  of  water  consumption  in 
agriculture  [20].  Fuzzy  set  and  interpolation  techniques  are 
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applied  for  land  suitability  evaluation  for  maize  in  Northern 
Ghana  [21].  Thus  Data  Mining  has  proved  to  have 
surprisingly  broad  application  in  every  issue  related  to 
agriculture  and  allied  field. 

V.  CONCLUSION 

Economic  and  efficient  use  of  resources  requires  timely  and 
sophisticated  analysis  on  an  integrated  view  of  the  data.  The 
growing  gap  between  more  powerful  storage  systems  and 
the  users’  ability  to  effectively  analyze  and  act  on  the 
information  they  contain  is  being  minimized  by  the 
techniques  of  data  mining.  Data  mining  has  importance 
regarding  pattern  recognition,  forecasting,  discovery  of 
knowledge  etc.,  in  different  business  domains.  Data  mining 
has  wide  application  domain  almost  in  every  industry  where 
the  data  is  generated  that’s  why  data  mining  is  considered 
one  of  the  most  important  frontiers  in  database  and 
information  systems  and  one  of  the  most  promising 
interdisciplinary  developments  in  Information  Technology. 
The  technology  and  its  application  in  Agri-allied  sector  are 
still  in  its  initial  stage  and  have  tremendous  future  prospect. 
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